4 research outputs found

    MEGA: Multilingual Evaluation of Generative AI

    Full text link
    Generative AI models have shown impressive performance on many Natural Language Processing tasks such as language understanding, reasoning, and language generation. An important question being asked by the AI community today is about the capabilities and limits of these models, and it is clear that evaluating generative AI is very challenging. Most studies on generative LLMs have been restricted to English and it is unclear how capable these models are at understanding and generating text in other languages. We present the first comprehensive benchmarking of generative LLMs - MEGA, which evaluates models on standard NLP benchmarks, covering 16 NLP datasets across 70 typologically diverse languages. We compare the performance of generative LLMs including Chat-GPT and GPT-4 to State of the Art (SOTA) non-autoregressive models on these tasks to determine how well generative models perform compared to the previous generation of LLMs. We present a thorough analysis of the performance of models across languages and tasks and discuss challenges in improving the performance of generative LLMs on low-resource languages. We create a framework for evaluating generative LLMs in the multilingual setting and provide directions for future progress in the field.Comment: EMNLP 202

    Severe communication delays are independent of seizure burden and persist despite contemporary treatments in SCN1A + Dravet syndrome: Insights from the ENVISION natural history study

    Get PDF
    Objective: Dravet syndrome (DS) is a developmental and epileptic encephalopathy characterized by high seizure burden, treatment‐resistant epilepsy, and developmental stagnation. Family members rate communication deficits among the most impactful disease manifestations. We evaluated seizure burden and language/communication development in children with DS. Methods: ENVISION was a prospective, observational study evaluating children with DS associated with SCN1A pathogenic variants (SCN1A+ DS) enrolled at age ≤5 years. Seizure burden and antiseizure medications were assessed every 3 months and communication and language every 6 months with the Bayley Scales of Infant and Toddler Development 3rd edition and the parent‐reported Vineland Adaptive Behavior Scales 3rd edition. We report data from the first year of observation, including analyses stratified by age at Baseline: 0:6–2:0 years:months (Y:M; youngest), 2:1–3:6 Y:M (middle), and 3:7–5:0 Y:M (oldest). Results: Between December 2020 and March 2023, 58 children with DS enrolled at 16 sites internationally. Median follow‐up was 17.5 months (range = .0–24.0), with 54 of 58 (93.1%) followed for at least 6 months and 51 of 58 (87.9%) for 12 months. Monthly countable seizure frequency (MCSF) increased with age (median [minimum–maximum] = 1.0 in the youngest [1.0–70.0] and middle [1.0–242.0] age groups and 4.5 [.0–2647.0] in the oldest age group), and remained high, despite use of currently approved antiseizure medications. Language/communication delays were observed early, and developmental stagnation occurred after age 2 years with both instruments. In predictive modeling, chronologic age was the only significant covariate of seizure frequency (effect size = .52, p = .024). MCSF, number of antiseizure medications, age at first seizure, and convulsive status epilepticus were not predictors of language/communication raw scores. Significance: In infants and young children with SCN1A+ DS, language/communication delay and stagnation were independent of seizure burden. Our findings emphasize that the optimal therapeutic window to prevent language/communication delay is before 3 years of age
    corecore